Atom AI Labs - AI-Powered Multi-Tenant Platform

Gap Analysis: External API Integrations for Costs, Health, and Benchmarks

**Date:** 2026-04-01

**Comparison:** atom-saas (SaaS) vs atom-upstream (Open Source)

**Scope:** Costs, Provider Health, Benchmark Data

---

Executive Summary

**Key Finding:** Upstream has **DynamicPricingFetcher** that integrates with **LiteLLM** and **OpenRouter APIs** for real-time cost data. SaaS uses hardcoded costs. Neither uses external APIs for health monitoring or benchmarks.

**Critical Gap:** SaaS is missing the DynamicPricingFetcher integration, meaning:

Pricing updates require code changes
No automatic price syncing when providers change rates
Missing cache-aware routing features
No prompt caching optimization data

---

1. Cost Tracking

✅ Upstream (atom-upstream)

**File:** atom-upstream/backend/core/dynamic_pricing_fetcher.py

**External APIs:**

**LiteLLM GitHub** - https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json

Fetches comprehensive pricing database
Updated regularly by LiteLLM community
Includes 100+ models across all providers

**OpenRouter API** - https://openrouter.ai/api/v1/models

Real-time model pricing
Provider: OpenRouter
Fallback when LiteLLM data is missing

**Features:**

class DynamicPricingFetcher:
    async def refresh_pricing(self, force: bool = False) -> Dict[str, Any]:
        # Fetch from both sources
        litellm_pricing = await self.fetch_litellm_pricing()
        openrouter_pricing = await self.fetch_openrouter_pricing()

        # Merge pricing (LiteLLM takes precedence)
        self.pricing_cache = {**openrouter_pricing, **litellm_pricing}

        # Save to cache (24 hour TTL)
        self._save_cache()

**Cache Strategy:**

Local file cache: ./data/ai_pricing_cache.json
24-hour TTL before refresh
Singleton pattern for efficiency

**Advanced Features:**

model_supports_cache(model_name) - Check if model supports prompt caching
get_cache_min_tokens(model_name) - Minimum tokens for caching (1024 OpenAI, 2048 Anthropic)
is_pricing_estimated(model_name) - Distinguish official vs estimated pricing
get_cheapest_models(limit) - Find lowest-cost models
compare_providers() - Compare average costs across providers

**Usage in Upstream:**

# Integrated into BYOKHandler and routing logic
from core.dynamic_pricing_fetcher import get_pricing_fetcher

fetcher = get_pricing_fetcher()
pricing = await fetcher.refresh_pricing()
cost = fetcher.estimate_cost("gpt-4o", 1000, 500)

❌ SaaS (atom-saas)

**Files:**

backend-saas/core/llm/embedding/providers.py - Embedding costs
backend-saas/core/llm/byok_handler.py - LLM routing (hardcoded)

**Current Implementation:**

# backend-saas/core/llm/embedding/providers.py
MODELS = {
    "text-embedding-3-small": {
        "cost_per_1m_tokens": 0.02,  # ❌ HARDCODED
    },
    "text-embedding-3-large": {
        "cost_per_1m_tokens": 0.13,  # ❌ HARDCODED
    },
    # ... similar for Cohere, Voyage, Nomic, Jina
}

**Problems:**

Pricing becomes outdated when providers change rates
Requires code deployment to update costs
No automatic synchronization
No cache-aware routing optimizations

**Missing Features:**

❌ Dynamic pricing updates from external APIs
❌ LiteLLM integration
❌ OpenRouter fallback
❌ Prompt caching support detection
❌ Cost comparison across providers
❌ Cheapest model discovery

---

2. Provider Health Monitoring

✅ Both Implementations (Similar)

**Upstream:** atom-upstream/backend/core/provider_health_monitor.py

**SaaS:** backend-saas/core/llm/registry/provider_health.py

**Implementation:** Both use **INTERNAL** tracking (no external APIs)

**Similar Features:**

Success/error rate tracking
Latency monitoring (rolling average)
Consecutive failure detection
Health score calculation (0.0-1.0 scale)

**Upstream (ProviderHealthMonitor):**

class ProviderHealthMonitor:
    def record_call(self, provider_id: str, success: bool, latency_ms: float):
        # Track in sliding window (5 minutes default)
        history.append((timestamp, success, latency_ms))

        # Calculate health: 70% success_rate + 30% latency_score
        health_score = (success_rate * 0.7) + (latency_score * 0.3)

**SaaS (ProviderHealthService):**

class ProviderHealthService:
    async def record_success(self, provider: str, latency_ms: float):
        # Track in Redis with 1-hour TTL
        # Rolling average latency calculation
        # Health state transitions (HEALTHY/DEGRADED/UNHEALTHY)

**Key Difference:**

Upstream: In-memory deque with sliding window (prevents memory leaks)
SaaS: Redis-backed with tenant isolation

**Neither uses external APIs** for:

❌ Provider status pages (e.g., status.openai.com)
❌ Uptime monitoring services
❌ Third-party health check APIs

---

3. Benchmark Data

❌ Both Implementations (Identical)

**Upstream:** atom-upstream/backend/core/benchmarks.py

**SaaS:** backend-saas/core/benchmarks.py

**Implementation:** Both use **STATIC HARDCODED** scores

**Source:**

"""
Curated Quality Scores for AI Models
Normalized 0-100 scale based on MMLU, GSM8K, HumanEval, and LMSYS Chatbot Arena.
Updated Jan 2026
"""
MODEL_QUALITY_SCORES = {
    "gemini-3-pro": 100,      # ❌ HARDCODED
    "gpt-5": 99,              # ❌ HARDCODED
    "claude-4-opus": 99,       # ❌ HARDCODED
    # ... 50+ models
}

**Problems:**

Scores become outdated as new models are released
Manual updates required when benchmarks change
No automatic synchronization with leaderboard APIs

**Missing Features:**

❌ LMSYS Chatbot Arena API integration
❌ MMLU/GSM8K/HumanEval API integration
❌ Automatic benchmark updates
❌ Real-time leaderboard polling

**Note:** This is understandable since benchmark leaderboards don't always have public APIs, and manual curation provides quality control.

---

4. Feature Comparison Table

Feature	Upstream	SaaS	Gap
Cost Tracking
Dynamic pricing via LiteLLM API	✅	❌	HIGH PRIORITY
Dynamic pricing via OpenRouter API	✅	❌	MEDIUM
Local pricing cache (24h TTL)	✅	❌	HIGH
Prompt caching support detection	✅	❌	MEDIUM
Cache min-threshold tracking	✅	❌	LOW
Provider cost comparison	✅	❌	MEDIUM
Cheapest model discovery	✅	❌	LOW
Estimated pricing flags	✅	❌	LOW
Health Monitoring
Internal success/error tracking	✅	✅	None
Internal latency tracking	✅	✅	None
Sliding window (5min)	✅	❌ (1h TTL in Redis)	LOW
External provider status page checks	❌	❌	FUTURE
Benchmarks
Static quality scores	✅	✅	None
Manual curation	✅	✅	None
External leaderboard APIs	❌	❌	FUTURE

---

5. Impact Analysis

Business Impact

**SaaS Gaps:**

**Stale Pricing** - If OpenAI/Anthropic change prices, SaaS customers continue paying estimated costs until code is deployed
**Missed Savings** - No cache-aware routing optimization (missing 90% cost reduction potential)
**Manual Updates** - DevOps required to update pricing in code

**Upstream Advantages:**

**Auto-Updating** - Pricing updates every 24 hours from LiteLLM (community-maintained)
**Cost Optimization** - Can route to cheapest models dynamically
**Cache Savings** - Prompt caching support reduces costs by 90% for applicable models

Technical Debt

**SaaS Technical Debt:**

Missing dynamic_pricing_fetcher.py (~400 lines)
No pricing cache infrastructure
BYOKHandler doesn't use dynamic pricing
No cache-aware routing in LLM service

**Estimated Effort to Port:**

Copy dynamic_pricing_fetcher.py → 2 hours
Remove SaaS-specific patterns → 1 hour
Integrate with BYOKHandler → 2 hours
Add pricing refresh cron job → 1 hour
Testing & validation → 2 hours
**Total: ~8 hours**

---

6. Recommendations

Priority 1: Port DynamicPricingFetcher (HIGH VALUE)

**Actions:**

Copy atom-upstream/backend/core/dynamic_pricing_fetcher.py to SaaS
Remove hard-coded costs from embedding providers
Integrate with BYOKHandler for cost estimation
Add background task to refresh pricing every 24 hours
Add tenant isolation to pricing cache (multi-tenancy requirement)

**Benefits:**

Automatic pricing updates (no code deployments needed)
Access to 100+ models with accurate pricing
Cache-aware routing for 90% cost savings
Provider cost comparison for optimization

Priority 2: Enhance Health Monitoring (MEDIUM VALUE)

**Actions:**

Consider porting upstream's sliding window approach (prevents memory leaks)
Add external provider status page checks (optional enhancement)

OpenAI: https://status.openai.com/api/v2/status.json
Anthropic: (no public API, but could scrape status page)
Google: https://status.cloud.google.com

**Benefits:**

Proactive provider health detection
Faster recovery from provider outages
Better routing decisions with real-time data

Priority 3: Benchmark Updates (LOW VALUE)

**Actions:**

Keep manual curation (quality control)
Set quarterly review schedule to update benchmarks
Consider adding "last_updated" timestamp to track freshness

**Rationale:**

Benchmark leaderboards don't always have public APIs
Manual curation prevents low-quality data from entering system
Upstream uses same approach, suggesting this is acceptable

---

7. Implementation Plan

Phase 1: Port DynamicPricingFetcher

**Tasks:**

Copy dynamic_pricing_fetcher.py from upstream
Add SaaS-specific modifications:

Remove local file cache (use Redis instead)
Add tenant_id isolation for pricing queries
Add tenant-scoped pricing overrides (enterprise feature)

Update embedding providers to use dynamic pricing
Integrate with BYOKHandler
Add pricing refresh cron job (Celery task)
Write unit tests

**Estimated Time:** 1-2 days

Phase 2: Cache-Aware Routing

**Tasks:**

Add prompt caching support to BYOKHandler
Implement cache min-threshold checks
Update routing logic to prefer cached models
Track cache hit/miss metrics
Add cost savings analytics

**Estimated Time:** 2-3 days

Phase 3: Health Monitoring Enhancements (Optional)

**Tasks:**

Port sliding window approach from upstream
Add provider status page polling (OpenAI, Anthropic)
Update health score calculation to include external status
Add alerting for provider degradation

**Estimated Time:** 1-2 days

---

8. Risk Assessment

Risks of Porting DynamicPricingFetcher

**Low Risk:**

Well-tested code from upstream
No breaking changes to existing APIs
Cache fallback if external APIs fail

**Medium Risk:**

Dependency on external GitHub/OpenRouter availability
Rate limiting on external APIs
Need to handle API failures gracefully

**Mitigations:**

Use 24-hour cache (retries have 24 hours to succeed)
Store fallback pricing in database
Graceful degradation to hardcoded costs if APIs fail
Monitor API call success rates

SaaS-Specific Considerations

**Multi-Tenancy:**

Pricing cache should be global (not per-tenant)
Enterprise tenants may have custom pricing overrides
Consider pricing tiers for different plans

**Billing:**

Dynamic pricing affects cost estimates
Need to track actual vs estimated costs
Consider margin protection (pricing updates shouldn't break margins)

---

9. Testing Strategy

Unit Tests

# Test dynamic pricing fetcher
async def test_fetch_litellm_pricing():
    fetcher = DynamicPricingFetcher()
    pricing = await fetcher.fetch_litellm_pricing()
    assert "gpt-4o" in pricing
    assert pricing["gpt-4o"]["input_cost_per_token"] > 0

async def test_cache_expiration():
    fetcher = DynamicPricingFetcher()
    # Test 24-hour cache logic
    assert fetcher._is_cache_valid() == True

async def test_external_api_failure():
    # Test graceful degradation when APIs are down
    fetcher = DynamicPricingFetcher()
    # Mock API failures
    pricing = await fetcher.refresh_pricing()
    # Should return cached pricing or empty dict

Integration Tests

async def test_byok_uses_dynamic_pricing():
    # Verify BYOKHandler uses DynamicPricingFetcher
    handler = BYOKHandler(tenant_id="test")
    cost = await handler.estimate_cost("gpt-4o", 1000, 500)
    assert cost > 0

async def test_pricing_refresh_background_task():
    # Test Celery task for pricing refresh
    # Verify pricing updates every 24 hours
    pass

---

10. Next Steps

Immediate Actions

**Review upstream implementation** - Read atom-upstream/backend/core/dynamic_pricing_fetcher.py fully
**Assess SaaS requirements** - Confirm tenant isolation, billing, and quota needs
**Create implementation plan** - Detailed tasks with acceptance criteria
**Get approval** - Present gap analysis to stakeholders for prioritization

Recommended Starting Point

If approved, start with **Phase 1: Port DynamicPricingFetcher** as it provides the highest business value with manageable risk.

**Files to Copy:**

atom-upstream/backend/core/dynamic_pricing_fetcher.py

**Files to Modify:**

backend-saas/core/llm/embedding/providers.py (remove hardcoded costs)
backend-saas/core/llm/byok_handler.py (integrate dynamic pricing)
backend-saas/main_api_app.py (add pricing refresh endpoint)

**New Files:**

backend-saas/tests/unit/test_dynamic_pricing_fetcher.py
backend-saas/core/tasks/pricing_refresh_task.py (Celery task)

---

Appendix: Code Samples

Example: Dynamic Pricing Integration

# Current SaaS (hardcoded)
class OpenAIEmbeddingProvider(BaseEmbeddingProvider):
    MODELS = {
        "text-embedding-3-small": {
            "cost_per_1m_tokens": 0.02,  # Hardcoded
        },
    }

# After Porting (dynamic)
class OpenAIEmbeddingProvider(BaseEmbeddingProvider):
    def __init__(self, api_key: str = None):
        super().__init__(api_key)
        from core.dynamic_pricing_fetcher import get_pricing_fetcher
        self.pricing_fetcher = get_pricing_fetcher()

    def estimate_cost(self, text: str, model: str) -> float:
        pricing = self.pricing_fetcher.get_model_price(model)
        if pricing:
            tokens = self._estimate_tokens(text)
            input_cost = pricing["input_cost_per_token"] * tokens
            return input_cost
        # Fallback to hardcoded cost
        return super().estimate_cost(text, model)

Example: Cache-Aware Routing

# After Phase 2 implementation
from core.dynamic_pricing_fetcher import get_pricing_fetcher

fetcher = get_pricing_fetcher()

# Check if model supports caching
if fetcher.model_supports_cache("gpt-4o"):
    min_tokens = fetcher.get_cache_min_tokens("gpt-4o")
    if estimated_tokens >= min_tokens:
        # Use gpt-4o for 90% cost savings
        return "gpt-4o"
else:
    # Use non-cached model
    return "gpt-4o-mini"

---

Conclusion

**Critical Gap:** SaaS is missing DynamicPricingFetcher that provides real-time cost data from LiteLLM and OpenRouter APIs.

**Impact:** High business value - automatic pricing updates, cache-aware routing, cost optimization.

**Recommendation:** Port dynamic_pricing_fetcher.py from upstream as Phase 1, followed by cache-aware routing in Phase 2.

**Estimated Effort:** 3-5 days total for full implementation including testing.

---

*Generated: 2026-04-01*

*Comparison: atom-saas (SaaS) vs atom-upstream (Open Source)*

*Focus: External API integrations for costs, health monitoring, and benchmarks*